Towards Hierarchical Prosodic Prominence Generation in TTS Synthesis

نویسندگان

Leonardo Badino

Robert A. J. Clark

چکیده

We address the problem of identification (from text) and generation of pitch accents in HMM-based English TTS synthesis. We show, through a large scale perceptual test, that a large improvement of the binary discrimination between pitch accented and non-accented words has no effect on the quality of the speech generated by the system. On the other side adding a third accent type that emphatically marks words that convey ”contrastive” focus (automatically identified from text) produces beneficial effects on the synthesized speech. These results support the accounts on prosodic prominence that consider the prosodic patterns of utterances as hierarchical structured and point out the limits of a flattening of such structure resulting from a simple accent/non-accent distinction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Data-driven Adaptation of Prosody in a Multilingual TTS

Proper accentuation and phrasing make the syntactic and semantic structure of the message more transparent to the listener. Therefore a good modeling of prosody in a TTS system has to be structured into appropriate levels. The implemented prosodic hierarchy should guide the listeners’ attention and help in support of the comprehension process. Since prosody functions as a distractor, it is very...

متن کامل

Identifying prosodic prominence patterns for English text-to-speech synthesis

This thesis proposes to improve and enrich the expressiveness of English Textto-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word i...

متن کامل

Designing prosodic databases for automatic modelling in 6 languages

We describe the design and creation of prosodic speech databases for 6 languages. The purpose of the databases is to allow derivation of prosody models in order to improve TTS synthesis. The main prosodic variables to model were word prominence, prosodic boundary strength and phone duration. We describe the database structure and contents and the methodology for creating prosodic databases, and...

متن کامل

An Environment for Word Prominence Classification in Slovenian Language

Besides phrasing, prominence is one of the most important parameters of speech prosody to model. The so called data driven approaches nowadays seem to be the appropriate solution for prosody modeling in current text to speech (TTS) systems. They allow prosodic regularities to be automatically extracted from a prosodic database of natural speech. In this paper we’ll present an evaluation of suit...

متن کامل

Prominence detected by listeners for future speech synthesis application

The point of interest in the present investigation is to find out and to make a pilot statistical presentation of the prominence distinguished by native speakers in read aloud texts taken from the Russian corpus for text-to-speech unit-selection synthesis. The TTS system uses the linguistic information encoded in the input text. Therefore the parameters which are easily extracted from the text ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Towards Hierarchical Prosodic Prominence Generation in TTS Synthesis

نویسندگان

چکیده

منابع مشابه

A Data-driven Adaptation of Prosody in a Multilingual TTS

Identifying prosodic prominence patterns for English text-to-speech synthesis

Designing prosodic databases for automatic modelling in 6 languages

An Environment for Word Prominence Classification in Slovenian Language

Prominence detected by listeners for future speech synthesis application

عنوان ژورنال:

اشتراک گذاری